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Abstract: 

Dynamic migration of lightweight threads supports both data locality and load balancing. 
However, migrating threads that contain pointers referencing data in both the stack and 
heap remains an open problem. We describe a technique by which threads with pointers 
referencing both stack and non shared heap data can be migrated such that the 
pointers remain valid after migration. As a result, threads containing pointers can now 
be migrated between processors in a homogeneous distributed memory environment 
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The Augmint multiprocessor simulation toolkit for Intel 
x86 architectures 
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Reference Cited: 13 
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Inspec Accession Number: 5437709 

Abstract: 

Most publicly available simulation tools only simulate RISC architectures. These tools 
cannot capture the instruction mix and memory reference patterns of CISC architectures. 
We present an overview of Augmint, an execution driven multiprocessor simulation 
toolkit that fills this gap by supporting Intel x86 architectures. Augmint also supports 
trace driven simulation for uniprocessors as well as multiprocessors, with minor effort on 
the part of simulator developers. Augmint runs m4 macro extended C and C++ 
applications such as those in the SPLASH and SPLASH-2 benchmark suites. Augmint 
supports a thread based programming model with shared globaf address space and 
private stack space. Augmint supports a simulator interface compatible with that of the 
MINT simulation toolkit for MIPS architectures, thus allowing the reuse of most 
architecture simulators written for MINT, Augmint simulations run on x8d based 
uniprocessor systems under Unix or Windows NT. The source code of Augmint is publicly 
available from http://www.csrd.uiuc.edu/iacoma/augmint 
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We present a nnemory management scheme for Java based on thread-local heaps. Assuming 
most objects are created and used by a single thread, it is desirable to free the memory 
manager from redundant synchronization for thread-local objects. Therefore, in our scheme 
each thread receives a partition of the heap in which it allocates its objects and in which it 
does local garbage collection without synchronization with other threads. We dynamically 
monitor to determine which objects are local and whi ... 

Keywords: JVM, Java, garbage collection, local garbage collection, locality, scalable 
garbage collection 
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Greg Bronevetsky, Daniel Marques, Keshav Pingali, Peter Szwed, Martin Schuiz 
October 2004 Proceedings of the 11th international conference on Architectural 
support for programming languages and operating systems 

Full text available: *^.p.df(235,77 KB). Additional Information: fiiji citaiion, abstrs^t, reteriA-pes, index teC-Or-s 

Trends in high-performance computing are making it necessary for long-running applications 
to tolerate hardware faults. The most commonly used approach is checkpoint and restart 
(CPR) - the state of the computation is saved periodically on disk, and when a failure occurs, 
the computation is restarted from the last saved state. At present, it is the responsibility of 
the programmer to instrument applications for CPR. Our group is investigating the use of 
compiler technology to instrument codes to ... 

Keywords: checkpointing, fault-tolerance, openMP, shared-memory programs 
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on Memory management, volume 36 issue i 
Full text available: *^ Ddf{l,33 MB} Additional Information: feiicitationj abstrsct, citings^ iM^x terms 

Java uses garbage collection (GC) for the automatic reclamation of computer memory no 
longer required by a running application. GC implementations for Java Virtual Machines 
(JVM) are typically designed for single processor machines, and do not necessarily perform 
well for a server program with many threads running on a multiprocessor. We designed and 
implemented an on-the-fly GC, based on the algorithm of Doligez, Leroy and Gonthier [13, 
12] (DLG), for Java in this environment. An on-the-f.,. 

Keywords: Java, concurrent garbage collection, garbage collection, memory management, 
on-the-fly garbage collection, programming languages 



4 



Cormac Flanagan, Stephen N Freund 

January 2004 ACM SIGPLAN Notices , Proceedings of the 31st ACM SIGPLAN-SIGACT 

symposium on Principles of programming languages, volume 39 issue i 
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^ terms 

Ensuring the correctness of multithreaded programs is difficult, due to the potential for 
unexpected interactions between concurrent threads. Much previous work has focused on 
detecting race conditions, but the absence of race conditions does not by itself prevent 
undesired thread interactions. We focus on the more fundamental non-interference property 
of atomicity; a method is atomic if its execution is not affected by and does not interfere 
with concurrently-executing threads. Atomic me ... 

Keywords: atomicity, dynamic analysis, reduction 



5 CycSes to recycie: garbage coiiection to the iA-64 

Richard L. Hudson, J. Elliot Moss, Sreenivas Subramoney, Weldon Washburn 
October 2000 ACM SIGPLAN Notices , Proceedings of the 2nd international symposium 
on Memory management, volume 36 issue i 

Full text available: odtii.2S.MBj Additional Information: tuirc[tation, sbatr.?.>S.. .QltiD^.?^ jM^KJ^inis 

The IA-64, Intel's 64-bit instruction set architecture, exhibits a number of interesting 
architectural features. Here we consider those features as they relate to supporting garbage 
collection (GC). We aim to assist GC and compiler implementors by describing how one may 
exploit features of the IA-64. Along the way, we record some previously unpublished object 
scanning techniques, and offer novel ones for object allocation (suggesting some simple 
operating system support that would simplify it ... 

Tian F, Lim, Przemyslaw Pardyak, Brian N. Bershad 

October 1998 ACM SIGPLAN Notices , Proceedings of the 1st international symposium 
on Memory management, volume 34 issue 3 

Full text available: H :;dW1 53 MB} Additional Information: M citation , ^biijriid. mM^. mim. inm 
^ terms 

Garbage collectors used in embedded systems such as Personal Java and Inferno or in 
operating systems such as SPIN must operate with limited resources and minimize their 
impact on application performance. Consequently, they must maintain short real-time 
pauses, low overhead, and a small memory footprint. Most garbage collectors, including the 
Treadmill algorithm, are inadequate because they sacrifice space for time. We have 
implemented a new Treadmill variant that provides good memory utilizatio ... 
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Ole Agesen, Alex Garthwaite 

October 2000 ACM SIGPLAN Notices , Proceedings of the 2nd international symposium 

on Memory management, volume 36 issue i 
Full text available: "^pd£743,S^KBj Additional Information: MLcMfen, sbstraot, .citings, index Jenm 

The performance of automatic memory management may be improved if the policies used in 
allocating and collecting objects had knowledge of the lifetimes of objects. To date, 
approaches to the pretenuring of objects in older generations have relied on profile-driven 
feedback gathered from trace runs. This feedback has been used to specialize allocation 
sites in a program. These approaches suffer from a number of limitations. We propose an 
alternative that through efficient sampling of objects a ... 

^ £QncLinsnQy;„Wi^ 

Martin T. Vechev, David F. Bacon 

October 2004 Proceedings of the 4th international symposium on Memory management 

Full text available: '^.!>afi490.ZiKKl Additional Information: MoMm <^??M<;1 rejismncM. indeKj;<M:rn^. 

Concurrent garbage collectors require write barriers to preserve consistency, but these 
barriers impose significant direct and indirect costs. While there has been a lot of work on 
optimizing write barriers, we present the first study of their elision in a concurrent collector. 
We show conditions under which write barriers are redundant, and describe how these 
conditions can be applied to both incremental update or snapshot-at-the-beginning barriers. 
We then evaluate the potential for write b ... 

Keywords: concurrent garbage collection, write barrier 



Supoorlinq dynamic data structures -on distributed -memory^ machines 
Anne Rogers, Martin C. Carlisle, John H. Reppy, Laurie J. Hendren 

March 1995 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 17 Issue 2 

r- MX ^ •. u. B^it Additional Information: \uW c^ion. absivrad. f^ij^-irence^i . dttna$; . ;ndex 
Full text available: "pi r;afi2.uo MB • ■ 

^ terms, rev tew 

Compiling for distributed-memory machines has been a very active research area in recent 
years. Much of this work has concentrated on programs that use arrays as their primary 
data structures. To date, little work has been done to address the problem of supporting 
programs that use pointer-based dynamic data structures. The techniques developed for 
supporting SPMD execution of array-based programs rely on the fact that arrays are 
statically defined and directly addressable. Recursive data s ... 

Keywords: dynamic data structures 



10 A senerMlonal most!y<:oncu^^^ H 
Tony Printezis, David Detlefs 

October 2000 ACM SIGPLAN Notices, Proceedings of the 2nd international symposium 
on Memory management, volume 36 issue i 

Full text available: 'g.txif(1.,67. MB| Additional Information: M citaUofL ^iftfi^ji^l, ^Mn^^ jndox Mmi> 

This paper reports our experiences with a mostly-concurrent incremental garbage collector, 
implemented in the context of a high performance virtual machine for the Java^" 
programming language. The garbage collector is based on the "mostly parallel" collection 
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algorithm of Boehm et al. and can be used as the old generation of a generational memory 
system. It overloads efficient write-barrier code already generated to support generational 
garbage collection to also ident ... 
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Simon Marlow, Simon Peyton Jones, Wolfgang Thaller 

September 2004 Proceedings of the ACM SIGPLAN workshop on Haskell 

Full text available: '^.pdfi:i02A}6 KB). Additional Information: iu!i.citaiion, abstract, reMer:Ces, index terms 

A Haskell system that includes both the Foreign Function Interface and the Concurrent 
Haskell extension must consider how Concurrent Haskell threads map to external Operating 
System threads for the purposes of specifying in which thread a foreign call is made. Many 
concurrent languages take the easy route and specify a one-to-one correspondence between 
the language's own threads and external OS threads. However, OS threads tend to be 
expensive, so this choice can limit the performance and scalabi ... 

''^ Operating systems.secyn^^^ 

Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, Dan Boneh 
October 2004 Proceedings of the 11th ACM conference on Computer and 
communications security 

Full text available: ' ^c-ilfri 93.68 KB) Additional Information: \\;\\ c:tation . <!b$;inK:t, ren-ifenc:e<; . index t^-rrng; 

Address-space randomization is a technique used to fortify systems against buffer overflow 
attacks. The idea is to introduce artificial diversity by randomizing the memory location of 
certain system components. This mechanism is available for both Linux (via PaX ASLR) and 
OpenBSD. We study the effectiveness of address-space randomization and find that its 
utility on 32-bit architectures is limited by the number of bits available for address 
randomization. In particular, we demonstrate a < ... 

Keywords: address-space randomization, automated attacks, diversity 
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Tills paper presents a simple and efficient data flow algorithm for escape analysis of objects 
in Java programs to determine (i) if an object can be allocated on tlie stack; (ii) if an object 
is accessed only by a single thread during its lifetime, so that synchronization operations on 
that object can be removed. We introduce a new program abstraction for escape analysis, 
the connection graph, that is used to establish reachability relationships between objects 
and object ref ... 

An on-the-tly referer^ce counting garbage collector for Java 
Yossi Levanoni, Erez Petrank 

October 2001 ACM SIGPLAN Notices , Proceedings of the 16th ACI^ SIGPLAN 

conference on Object oriented programming, systems, languages, and 

applications, volume 36 Issue 11 

Additional Information: M.5!t^ten, abstrBct, r^M^;?^^., dtincjs, index 



Full text available: "p i jxjtY280.30 KB . 

Reference counting is not naturally suitable for running on multiprocessors. The update of 
pointers and reference counts requires atonnic and synchronized operations. We present a 
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novel reference counting algorithnn suitable for a multiprocessor that does not require any 
synchronized operation in its write barrier (not even a compare-and-swap type of 
synchronization). The algorithm is efficient and may complete with any tracing algorithm. 
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Jeffrey S. Chase, Henry M. Levy, Michaei J. Feeiey, Edward D. Lazowska 

November 1994 ACM Transactions on Computer Systems (TOCS), Volume 12 issue 4 

II A ^ 1 ui ksii ^^nv Additional Information: full citation, abstract, reference^, citings, index 
Full text available: ' P^|X'f(2B/ MB- 

This article explores nnemory sharing and protection support in Opal, a single-address-space 
operating system designed for wide-address (64-bit) architectures. Opal threads execute 
within protection domains in a single shared virtual address space. -Sharing is simplified, 
because addresses are context independent. There is no loss of protection, because 
addressability and access are independent; the right to access a segment is determined by 
the protection domain in which a thread executes. T ... 

Keywords: 64-bit architectures, capability-based systems, microkernel operating systems, 
object-oriented database systems, persistent storage, protection, single-address-space 
operating systems, wide-address architectures 
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Full text available: "Spcsfi 1-19 !v1B}. 

^ terms 

We have built a parallel dialect of Scheme called STING that differs from its contemporaries 
in a number of important respects. STING is intended to be used as an operating system 
substrate for modern parallel programming languages. The basic concurrency management 
objects is STING are first-class lightweight threads of control and virtual processors (VPs). 
Unlike high-level concurrency structures, STING threads and VPs are not encumbered by 
complex synchronization protocols. ... 

^ ^ Who [fcprcg optM 
Dirk Grunwald, Richard Neves 

September 1 996 Proceedings of the seventh international conference on Architectural 
support for programming languages and operating systems, volume 31 , 

30 Issue 9 , 5 

Additional Information: fvlji.citatjon, abstract, re^^r^rtCes, citinas. Index 



Full text available: "iliJdf.l 11 MB) 
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Modern languages and operating systems often encourage progranamers to use threads, or 
independent control streams, to mask the overhead of some operations and simplify 
program structure. Multitasking operating systems use threads to mask communication 
latency, either with hardwares devices or users. Client-server applications typically use 
threads to simplify the complex control-flow that arises when multiple clients are used. 
Recently, the scientific computing community has started using ... 
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Luke K. McDowell, Susan 3. Eggers, Steven D. Gribble 

June 2003 ACM SIGPLAN Notices , Proceedings of the ninth ACM SIGPLAN symposium 
on Principles and practice of parallel programming, volume 38 issue 10 
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Simultaneous multithreading (SMT) represents a fundamental shift in processor capability. 
SMT's ability to execute multiple threads simultaneously within a single CPU offers 
tremendous potential performance benefits. However, the structure and behavior of 
software affects the extent to which this potential can be achieved. Consequently, just like 
the earlier arrival of multiprocessors, the advent of SMT processors prompts a needed re- 
evaluation of software that will run on them. This evaluation ... 



Keywords: runtime support, servers, simultaneous multithreading 
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^ terms 

Many concurrent garbage collection (GC) algorithms have been devised, but few have been 
implemented and evaluated, particularly for the Java programming language. Sapphire is an 
algorithm we have devised for concurrent copying GC. Sapphire stresses minimizing the 
amount of time any given application thread may need to block to support the collector. In 
particular. Sapphire is intended to work well in the presence of a large number of application 
threads, on small- to medium-scale shared memor ... 
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Full text available: MB) 

This paper describes a programming system called Amber that permits a single application 
program to use a homogeneous network of computers in a uniform way, making the 
network appear to the application as an integrated multiprocessor. Amber is specifically 
designed for high performance in the case where each node in the network is a shared- 
memory multiprocessor. Amber shows that support for loosely-coupled multiprocessing can 
be efficiently realized using an obje ... 
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While dynamic linking has become an integral part of the run-time execution of modem 
programming languages, there is increasing recognition of the need for support for hot 
swapping of running modules, particularly in long-lived server applications. The interesting 
challenge for such a facility is to allow the new module to change the types exported by the 
original module, while preserving type safety. This paper describes a type-based approach 
to hot swapping running modules. The approach Is bas ... 

Keywords: dynamic typing, hot swapping, module interconnection languages, shared 
libraries 
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Argus is a programming language and system developed to support the construction and 
execution of distributed programs. This paper describes the implementation of Argus, with 
particular emphasis on the way we implement atomic actions, because this is where Argus 
differs most from other implemented systems. The paper also discusses the performance of 
Argus. The cost of actions is quite reasonable, indicating that action systems like Argus are 
practical. 

A.paral|e!,jncramenM.an 

Yoav Ossia, Ori Ben-Yitzhak, Irit Goft, Eliiot K. Koiodner, Victor Leikehman, Avi Owshanko 
May 2002 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2002 Conference 
on Programming language design and implementation, volume 37 issue 5 

Additional Information: fuH citation . ?;bstrc-ict , references , citings , index 



Full text available: m pcifimM KB). 



http://portal.acm.org/resultsxfm?queiy=%2Blocal%20%2B%3Cand%3E%20%2Bglobal%20 1/14/05 



Results (page 2): +local +<and> H-global +<and> +"thread stack" +<and> +heap 



Page 2 of 6 



Multithreaded applications with multi-gigabyte heaps running on modern servers provide 
new challenges for garbage collection (GC). The challenges for "server-oriented" GC include: 
ensuring short pause times on a multi-gigabyte heap, while minimizing throughput penalty, 
good scaling on multiprocessor hardware, and keeping the number of expensive multi-cycle 
fence instructions required by weak ordering to a minimum. We designed and implemented 
a fully parallel, incremental, mostly concurrent colle ... 

Keywords: JVM, Java, concurrent garbage collection, garbage collection, incremental 
garbage collection, weak ordering 
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According to conventional wisdom, interfaces provide flexibility at the cost of performance. 
Most high-performance Java virtual machines today tightly integrate their core virtual 
machines with their just-in-time compilers and garbage collectors to get the best 
performance. The Open Runtime Platform (ORP) is unusual in that it reconciles high 
performance with the extensive use of well-defined interfaces between its components. ORP 
was developed to support experiments in dynamic compilation, garb ... 

Keywords: JVM, Java, dynamic compilation, garbage collection, interface design, interfaces, 
just-in-time compilation, modular components, virtual machine 
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We call a garbage collector conservative if it has only partial information about the location 
of pointers, and is thus forced to treat arbitrary bit patterns as though they might be 
pointers, in at least some cases. We show that some very inexpensive, but previously 
unused techniques can have dramatic impact on the effectiveness of conservative garbage 
collectors in reclaiming memory. Our most significant observation is that static data that 
appears to point to the heap should not result i ... 
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The deployment of Java as a concurrent programming language has created a critical need 
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for high-performance, concurrent, and incremental multiprocessor garbage collection. We 
present the Recycler, a fully concurrent pure reference counting garbage collector that we 
have implemented in the Jalapeno Java virtual machine running on shared memory 
multiprocessors. 

While a variety of multiprocessor collectors have been proposed and some have been 
implemented, experimental dat ... 
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We describe an implennentation of a sizable subset of OpenMP on networks of workstations 
(NOWs). By extending the availability of OpenMP to NOWs, we overconne one of its primary 
drawbacks compared to MPI, namely lack of portability to environments other than 
hardware shared memory machines. In order to support OpenMP execution on NOWs, our 
compiler targets a software distributed shared memory system (DSM) which provides multi- 
threaded execution and memory consistency This paper presents two contri ... 
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Progranns written in concurrent object-oriented languages, especially ones that employ 
thread-safe reusable class libraries, can execute synchronization operations (lock, notify, 
etc.) at an amazing rate. Unless Implemented with utmost care, synchronization can become 
a performance bottleneck. Furthermore, in languages where every object may have its own 
monitor, per-object space overhead must be minimized. To address these concerns, we 
have developed a meta-lock to mediate access to synchro ... 

Keywords: concurrent threads, object-oriented language implementation, synchronization 
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Jalapeno is a virtual machine for Java^" servers written in Java. A running Java program 
involves four layers of functionality: the user code, the virtual-machine, the operating 
system, and the hardware. By drawing the Java / non-Java boundary below the virtual 
machine rather than above it, Jalapeno reduces the boundary-crossing overhead and opens 
up more opportunities for optimization.To get Jalapeno started, a boot image of a ... 
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High performance applications on shared memory machines have typically been written in a 
coarse grained style, with one heavyweight thread per processor. In comparison, 
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programming with a large number of lightweiglit, parallel threads has several advantages, 
including simpler coding for programs with irregular and dynamic parallelism, and better 
adaptability to a changing number of processors. The programmer can express a new thread 
to execute each individual parallel task; the implementation dyn ... 

Keywords: Pthreads, dynamic scheduling, irregular parallelism, lightweight threads, 
multithreading, space efficiency 
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Java 2 has a security architecture that protects systenns from unauthorized access by mobile 
or statically configured code. The problem is in manually determining the set of security 
access rights required to execute a library or application. The commonly used strategy is to 
execute the code, note authorization failures, allocate additional access rights, and test 
again. This process iterates until the code successfully runs for the test cases in hand. Test 
cases usually do not cover all paths th ... 
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The MANTIS Afultlmod/\l system for A/eTworks of In-situ wireless Sensors provides a new 
multithreaded embedded operating system integrated with a general-purpose single-board 
hardware platform to enable flexible and rapid prototyping of wireless sensor networks. The 
key design goals of MANTIS are ease of use, i.e. a small learning curve that encourages 
novice programmers to rapidly prototype novel sensor networking applications in software 
and hardware, as well as flexibility, ... 
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Low-overhead message passing is critical to the performance of many applications. Active 
Messages reduce the software overhead for message handling: messages are run as 
handlers instead of as threads, which avoids the overhead of thread management and the 
unnecessary data copying of other communication models. Scheduling the execution of 
Active Messages is typically done by disabling and enabling interrupts, or by polling the 
network. This primitive scheduling control, combined with the fac ... 
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